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Abstract. In diffraction imaging, one is tasked with reconstructing a signal from its power 
spectrum. To resolve the ambiguity in this inverse problem, one might invoke prior knowl- 
edge about the signal, but phase retrieval algorithms in this vein have found limited success. 
One alternative is to create redundancy in the measurement process by illuminating the sig- 
nal multiple times, distorting the signal each time with a different mask. Despite several 
recent advances in phase retrieval, the community has yet to construct an ensemble of masks 
which uniquely determines all signals and admits an efficient reconstruction algorithm. In 
this paper, we leverage the recently proposed polarization method to construct such an en- 
semble. We also present numerical simulations to illustrate the stability of the polarization 
method in this setting. In comparison to a state-of-the-art phase retrieval algorithm known 
as PhaseLift, we find that polarization is much faster with comparable stability. 



1. Introduction 

In many applications, one wishes to reconstruct a signal from the magnitudes of its Fourier 
coefficients. This problem is known as phase retrieval, and it has been instrumental to many 
important scientific advances, including the Nobel Prize-winning work that leveraged X- 
ray diffraction to establish the double helix structure of DNA [22]. Of course, given only 
the magnitudes of a signal's Fourier coefficients, one does not have enough information to 
recover the signal — while the Fourier transform is injective, the point-wise absolute value is 
not. As such, one is inclined to use a priori knowledge of the signal, and hope it is then 
uniquely determined by the Fourier magnitudes. For example, to deduce the structure of 
DNA, Watson and Crick [22] applied certain chemical assumptions, along with a knowledge 
of van der Waals interactions between atoms. 

In order to image more exotic molecules, such assumptions are difficult to apply, and so 
there has been quite a bit of work attempting to exploit more general assumptions (e.g., 
positivity or support constraints). To account for this sort of prior information, the most 
popular phase retrieval algorithms are modifications of Gerchberg and Saxton's original ap- 
proach [T3], which alternates between the time and frequency domains, iteratively correcting 
the current guess by imposing time-domain assumptions or scaling Fourier coefficients to 
match the measured data. Marchesini [T7] surveys and compares the various modifications, 
but they all have a tendency to stall in local minima. 

To address this issue, Candes, Eldar, Strohmer and Voroninski [7] proposed an alternative 
methodology whereby nonuniqueness is overcome not by prior information, but by addi- 
tional illuminations. For each illumination, a different mask (or grating) is used to distort 
the appearance of the object in question; in mathematical parlance, each mask acts as a 
multiplication operator on the desired signal before the Fourier transform. Furthermore, 
Candes et al. constructed three masks such that the corresponding illuminations uniquely 
determine almost every signal up to a global phase factor; a fourth illumination is necessary 
to uniquely determine all signals [5]. However, the phase retrieval algorithms they propose 
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are not guaranteed to reconstruct the desired signal from these three illuminations, even if 
the signal is uniquely determined by the illuminations. 

Let F* : £(Z M ) — > £(Z M ) denote the discrete Fourier transform (DFT) defined by 

(F*x)(m) Yl <™>~ 2nimm ' IM Vm e Z M . 

In this paper, we follow the lead of [7J to find masks {D k }£~Q (i.e., multiplication operators 
on i(Z M ) = C M ) such that every signal x e !(Zm) can be efficiently reconstructed, up to a 
global phase factor, from measurements of the form 

{l^^l 2 }^ 1 = = {\(x,D k f m )f}«S Q ] 

where f m denotes the complex sinusoid {^e 2mmm '^ M } m i & z M - Specifically, we will design 
the masks in such a way that allows for a new reconstruction technique to work, namely 
the polarization-based technique introduced in pQ. As established in pQ, every signal x can 
be stably reconstructed from \{x, fi)\ 2 if the measurement vectors ipi are constructed us- 
ing a particular random process. In this paper, we show how to fulfill the design criteria 
that measurement vectors be of the form Dj,f m while still allowing for polarization-based 
reconstruction. If anything, this demonstrates that polarization is flexible enough to prov- 
ably accommodate certain measurement design requirements, which are bound to arise from 
various applications. 

In the next section, we briefly review the main ideas behind the polarization-based tech- 
nique, and we show how to apply them with Fourier masks. In particular, we reduce the 
problem of building masks to a problem in additive combinatorics: Find a small subset of 
Zjvf with small Fourier bias. In Section 3, we use concentration-of-measure arguments to 
solve this problem, culminating in the construction of O(logM) masks that uniquely de- 
termine every signal in C M up to a global phase factor. Next in Section 4, we compare 
the performance of polarization to a state-of-the-art phase retrieval algorithm known as 
PhaseLift [TIE]. Here, we show that polarization provides a comparable amount of stability 
with Fourier masks, while being orders of magnitude faster. We conclude in Section 5 with 
some remarks. 



2. HOW TO POLARIZE FOURIER MASKS 

The main contribution of this paper is that the ideas in pQ can be leveraged to uniquely 
measure signals with masked Fourier transforms. In this section, we summarize these ideas 
and explain how we apply them. In [1] , the signal x is measured with two ensembles of vectors 
&e ^ C M . Here, $y has the property that every subcollection of size M spans C M ; such 
ensembles are called full spark frames, of which there are several known constructions [2j [18] . 
To determine $g, first define a graph G = (V, E) whose vertices i E V index the measurement 
vectors <fi G $y. Then each edge E E corresponds to three members of namely 
{ipi + oj r ipj} 2 =0 , where u> = e 27 ™/ 3 . These edges are useful because they determine the 
relative phase between the inner products (x,(pi) and (x,ipj). Specifically, a version of the 
polarization identity gives that 

I 2 _ i 2 

{x,<pi)(x,<pj) = -^2u r \(x,<pi) +u r (x,¥j)\ 2 = -^2u r \(x,<pi + uj r pj)\ 2 , (1) 
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and so the relative phase is calculated by normalizing this quantity (provided it's nonzero). 
On the other hand, if an inner product (x,tpi) is zero, then for every vertex j adjacent 
to i in G, the members of corresponding to are useless, since the relative phases 
they are intended to capture are not well defined. Indeed, x being orthogonal to ipi has 
the effect of deleting the vertex i (and its edges) from the graph G. However, since $y is 
full spark, x is orthogonal to at most M — 1 vertex vectors, thereby limiting the damage to 
our graph. Furthermore, for any remaining connected component S C V of size > M, we 
can propagate the relative phases to determine {(x, (pi)}ies up to a global phase factor, and 
since {(pi}ies necessarily span C M , we can determine x from {(x, y9j)}j e s by least-squares 
estimation. Interestingly, there will always be a connected component of size > M after 
the removal of any M — 1 vertices, provided the graph G is regular with a sufficiently large 
spectral gap. To be clear, the spectral gap of a <i-regular graph G is determined by the 
eigenvalues Ai > A 2 > • • ■ of its adjacency matrix: 

spg(G) = 3 f Ai - max |A, 

and the existence of the requisite connected component is given by the following classical 
result in spectral graph theory (e.g., see Lemma 5.2 of [15]): 

Lemma 1. Consider a d-regular graph G of n vertices. For alls < spg(G)/6, removing any 
edn edges from G results in a connected component of size > (1 — 2s/ spg(G))n. 

In summary, in order to apply the polarization trick in the setting of masked DFTs, it 
suffices to (i) construct a full spark ensemble <3>y with masked DFTs, and (ii) construct a 
graph G with a sufficiently large spectral gap and whose corresponding edge vectors $ e can 
also be implemented with masked DFTs. We quickly address (i) with the following result: 

Lemma 2. For each k G {0, ...,K — 1}, pick some nonzero a k G C and define Dk : = 
diag{o^}™=o- Letf m denote the complex sinusoid {±e 2 ™ mm '/ M } m , e z M Then {D k f m }*^%^ 
is full spark if and only if a k a k , is not an mth root of unity for any pair of distinct 
k,k' G {0,...,K- 1}. 

Proof. The M x KM matrix whose columns are Dkf m has Vandermonde form, as does each 
of its M x M submatrices. These submatrices are all invertible precisely when their deter- 
minants are nonzero, that is, when {otke 2mm ^ M }k=o m=l are distinct (by the Vandermonde 
determinant formula). The lemma immediately follows. □ 

The condition above is satisfied with probability 1 if the afc's are drawn uniformly at 
random from the complex unit circle. For a deterministic alternative, it suffices to take 
oik = e 2mk / KM . Now that we have established how to construct masks Dk such that $y = 
{Dkfm)k=o m=o ^ S s P ar k, we turn to solving (ii). Before describing our construction of 
G, we first motivate it by illustrating how polarized combinations of our vertex vectors can 
be expressed using masked DFTs: 

Lemma 3. Take to = e 2m ^ 3 , let E = diag{e 2mm ^ M }^~}_denote the modulation operator, and 



Then 



consider $y = {Dkf m }k=o m=o> as defined in Lemma 

Dkf m + co r D k ,f m , = (D k + uj r E m '- m D k ,)U 
for all k, k! G {0, . . . , K — 1} and m, m! G Z M . 
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Proof. Consider the £th entry of D k if m r. 

M M 
Then D k f m + u r D k ,f m , = D k f m + u r E m '- m D k ,f m = (D k + u r E m '~ m D k ,)f m . □ 

When implementing the modulation trick of Lemma [3] using DFTs, we will have to fix 
m' — m (so as to apply a fixed mask) and let m vary (since we will mask an entire DFT). 
As such, we intend to use auxiliary masks of the form {D k + u r E a D k /}l =0 , thereby drawing 
an edge between every (k,m) and (k',m') such that m' — m = a mod M. This informs our 
decision of how to construct the graph G, since it enables a masked-DFT implementation: 

Definition 4. We construct G as follows: Pick A C 7L M such that ^ A and A = —A. 

Then (k,m) and (k',m') are adjacent precisely when m' — m G A. 

Note that for each a G A, the corresponding edges between {(k, m)} me z M and {(k', m')} m i £ z 
are implemented with the masks {D k + u r E a D k >}l =0 . As such, A is the set of modulations 
used to combine the K original masks D k into 3( x ^" 1 )|v4| auxiliary masks. Also note that 
G has no loops since ^ A, G is not directed since A = —A, and furthermore every vertex 
has degree if so G is regular. It remains to show that G has a sufficiently large spectral 
gap. To simplify this analysis, we first relate the spectral gap to a fundamental concept in 
additive combinatorics called the Fourier bias. As in [20], take A C r L M and let 1a denote 
the characteristic function of A. Then the Fourier bias of A is defined as follows: 

P|| u :=max|(F*l A )(m)|. 

Fourier bias is used in additive combinatorics to measure pseudorandomness; here, the main 
idea is that correlation with any complex sinusoid indicates regularity, which is not typically 
exhibited by random sets. As indicated earlier, the spectral gap of G is intimately related 
to the Fourier bias of A: 

Lemma 5. Consider the graph G defined in Definition^ The spectral gap of G is 

spg(G) = l-— 

where || • || u denotes Fourier bias. 

Proof. To determine the spectral gap, we first determine the adjacency matrix W of G. 
For any pair k,k' G {0,. . . ,K — 1}, the adjacency rule for (k,m) and (k',m') is whether 
m! — m G A. As such, the (k, k')th block of W is circulant, with each row being a translation 
of 1a- This gives the expression W = J <8> circ(l J 4), where J is the K x K all-ones matrix 
and ® denotes the Kronecker product. One useful property of the Kronecker product is that 
its eigenvalues are products of eigenvalues: 

A W) (^)=A i (J)A i (circ(l A )). 

Here, we note that the only nontrivial eigenvalue of J is K, and the eigenvalues of circ(l^) 
are the entries of the Fourier transform MF*1a- As such, the nontrivial eigenvalues of W 
are given by 

X(i, m )(W) = Ax(J)A m (circ(l A )) = KM(F*l A )(m) = K ^ l^m^e" 2 ™ '' M . 
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By the triangle inequality, we then have 



\X M (W)\ =K 



l A (m')e- 27rimm ' /M <K \lA(m')\ = K\A\, 



m'eZ M 



with equality when m = 0. This means the largest eigenvalue of W is Ai = ^|^.|, which 
corresponds to the all-ones eigenvector, and so the spectral gap of G is 

BPg(G) = i(A: -max|A,|) = ^ (k\ A\ - max \KM(F>1 A ) (m)|) = 1 - ^||.4||„, 

as claimed. □ 

At this point, we have reduced the problem of finding injective Fourier masks to finding a 
small set A C of sufficiently small Fourier bias (i.e., a problem in additive combinatorics). 
In the next section, we use the following random process to construct this set: Draw B C r L M 
at random and take A := B U (— B) \ {0}. With the appropriate choice of distribution, we 
construct A of size O(logM) and Fourier bias (9( lo fi^ ), thereby producing a graph G with 
a spectral gap that is bounded away from zero (see Theorem [9]). We also show that the 
spectral gap is bounded away from zero only if A has size f2(logM), thereby demonstrating 



the optimality of our analysis (see Theorem 11). Overall, letting K be a sufficiently large 
constant, the techniques of this paper use a total of K + 3( K ^)\A\ = e(logM) Fourier 
masks to uniquely determine any signal (see Theorem 10). 



3. Small sets with small Fourier bias 

In the previous section, we reduced the main problem of this paper to one in additive com- 
binatorics: Find a small set A C 7L M with small Fourier bias ||^4||u = max m ^o |(-^*1a)(^)|- 
In our case, we want A to satisfy two additional properties: ^ A and A = —A. To account 
for these, we will approach this problem by first drawing fl C at random, and then 
taking A := B U (— B) \ {0}. We start with the following lemma: 

Lemma 6. Suppose the entries of 1b are independent, identical Bernoulli random variables 
with mean c . Then the following simultaneously hold with high probability: 

(i) sn(-s) = 

(ii) £ B 

(iii) ^clogM < \B\ < fclogM 
Proof. First, a union bound gives 



Pr [B n (-B) ^ 0^ = Pr (3m G Z M s.t. m, —m G B 



< M Pr [m G B Pr -me B 



c 2 log 2 M 
M ' 



Next, we have Pr(0 G B) = c -^. For (iii), we apply a multiplicative form of the Chernoff 
bound (see (6) and (7) in [2]): Let X be a sum of independent 0-1 random variables. Then 

Pr (\X - ELY] I > SE[X] \ < 2e- s2E[x]/3 

for any choice < 5 < 1. Here, we let X := ^] m6Z 1b(^i) — \B\ and take 6 = 1/2. Then 

Pr M|S| - clogMj > ^clogil/) < 2e" (clogM)/12 = 2M" c/12 . 
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The result then follows from a union bound. □ 
Note that in the event of Lemma |6l we have 1a = 1b + 1-b, and so 



„ = max |(F*l A )(m)| < max |(F*l B )(m)| + \(F*l_ B )(m) 

< max |(F*l B )(m)| + max |(F*l_ B )(m)| = 2\\B\\ U , (2) 

m^O m^O 

where the last step applied a complex conjugate to (F*l_ B )(m). As such, it suffices to show 
that random sets B have small Fourier bias: 

Lemma 7. Pick c > 4 and suppose the entries of 1 B are independent, identical Bernoulli 
random variables with mean clo h M ■ Then \\B\\ U < 3y/c- with high probability. 

To prove this lemma, we will apply the following version of the Chernoff bound: 

Lemma 8. Assume that {Zi\™ =1 are jointly independent complex random variables where 
\Z,i — E[Zj]| < 1 for all i. Take Z := Y^=i ^ an ^ denote a := ^/Var(Z). Then 

Px(\Z-E[Z}\ >ta) <4ma X {e- i2 / 8 ,e" to/ ^} 

for any t > 0. 

Proof. Let X and Y denote the real and imaginary parts of Z. Then a union bound gives 
Pr (JZ-E[Z]| > to-) = Pr (JX-E[X]| 2 + |F-E[r]| 2 > tV) 

< Pr ( |X-E[X]| > J +Pr ( |F-E[F]| > t °-\. 



V2J V V2J 

Denote ax '■= ^/Var(X) and ay := ^/Var(Y). Since Var(X) + Var(F) = Var(Z), we then 
have ax < cr and cry < a, and so 

Pr (\Z-E[Z]\ > to) < Pr (jX-E[X]| > ^|) + Pr ^|Y-E[Y]| > ^0 . 

The result then follows from applying Theorem 1.8 of [20] to both terms of the right-hand 
side. □ 

Proof of Lemma\7\ Fix m G Zm, m ^ 0, and take Z m i := |l B (m')e~ 27r * mm '/ M for each 
m' G Zm- Then {Z m >} m i e z M are jointly independent with 

|Z m ,-E[Z w ]| < \Z m ,\ + \E[Z m A\ < 1. 

As such, {Z m '} m > e z M satisfy the conditions of Lemma [8j and so we now seek the expected 
value and variance of 

Z := Z ™' = \zZ l£K)e- 2 ™' /M 
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Take p := to be the probability that l B {m') is I. Then for each S C Z M , the 

probability that B — S is j?' s '' (1 — p) M- l s L This leads to the following calculation: 



E[Z]= (\ E l5(m)e- 2 ™'/M\ |5| (1 _ p) M-|5| 

SCZ M ^ m'£Z M 



= - E E e -27rimm'/M p |5|^ 1 _ p )M-|S| 
^ SCZ M m'eS 

= - e -2nimm'/M ^ p |S| ^ _ p )M-\S\ _ 

^ m'SZ M 5D{m'} 

From here, we apply the binomial theorem to get 

SD{m'} TCZ M \{m'} 

and so K[Z] = by the geometric sum formula (recall that m ^ by choice). Next, we 
compute Var(Z) = E[\Z - E[Z]\ 2 ] = E[\Z\ 2 ] = E[ZZ\: 



Var(Z)= E Q E ls(m')e- 2mmm ' /M ) (\ E ls(m>™ mm '' /M )p |S| (l - P)^ 1 

SCZ M ^ m'eZ M ' ^ m"6Z M ' 

= - ^ e 27rim(m"- m ') ^ p |S|^ _ p )M-|S|_ 

^m'6Z M m"6Z M SD{m',m"} 

At this point, we break the sum over m" G Zm into cases in which m" = m' and m" ^ m'\ 



Var(Z) = ^ e 2mm ^ m "- m '^ ^ ^(1 - p) M ^ 

m'eZ M 5D{m',m"} 



+ 7 E E e 2 -(") £ P |5| (1"P) J 



\M-\S\ 

^ m'&M m"€Z M SD{m',m"} 



M-|S| 



= J M ^+i E E e 2mm[m "- ml) E p^'ci -p) 

m'eZ MTO "eZ M SD{m',m"} 
m"^m' 

Here, another application of the binomial theorem gives 

p lsl (i-p) M ~ lsl =P 2 E p |t| (i-p) (m - 2)HT| =p 2 , 

SD{m',m"} TCZ M \{m',m"} 

and so by the geometric sum formula, we have 

Var(Z) = ^Mp + \p 2 E e 2 ™ m{m "~ m,) 

m'eZ M m"eZ M 
in"/™' 



l M p + ^p 2 ^ (-1) = l -Mp - X -Mp 2 = ±Mp(l - p). 



4 

m'eZ M 
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Having calculated the expected value and variance of Z , we are now ready to use Lemma [8] 
(and the fact that Z = : y(i 71 *l^)(m)). A union bound gives 

Pr - w) - {M - 1)Pr (l^ 1 ^^! ^ W 

= (M — 1) Pr (|Z-E[Z]| > to) < 4(M- l)max|e-* 2/8 ,e- ia/2 ^}. 
We select t = (a/2c(1 + e) log M)/a and simplify the exponents in our probability bound: 

8 Mp(l-p) ~ Mp v ; & , 2 ^ 2 & 

With this choice of t, we continue: 

Pr(||£|| u > v / 8c(l + ^)" 1 ^) <4(M-l)max{e-( 1+£ ) lo ^^e-(v / ^)/ 2 ) 1 °s^} 

< 4max|M- £ ,M 1 -v /c ( 1+£ )/ 2 |. 

Since c > 4, we therefore have that < • with high probability. □ 

We can now combine Lemmas [6] and [7] to produce a graph G of the form in Definition [4] 
with a large spectral gap: 

Theorem 9. Pick c > 4 and suppose the entries of 1b are independent, identical Bernoulli 
random variables with mean clo h M ■ Take A := B U (— B) \ {0} and define G according to 
Definition \4\ Then 

spg(G) > 1 - 4 

wii/i /wg/i probability. 

Proof. We will assume that the events of Lemmas [6] and [7] hold simultaneously, as they will 
with high probability. Then \A\ = 2\B\, and furthermore by ([2]), ||A|| U < 2||i?|| u . Starting 
with Lemma [5j we then have 

„ „„ M „ „ M _ logM 6 
spg(G) = 1 - r-rr \\A L > 1 - — B L > 1 - 1 ?>Jc- -f— = 1 



where the second inequality applies Lemmas |6](iii) and [7j □ 

We now apply Lemma [l] to show that O(logM) Fourier masks suffice for injectivity: 

Theorem 10. Take K = 12, c = 144, suppose the entries of 1b are independent, identical 
Bernoulli random variables with mean clo h M , and take A := BU (-B) \ {0}. Then with high 
probability, the < 2 ■ 10 5 • log M masks 

{DkYk=Q u i D k + u r E a D k i} k ~^ fe/=0j ;L 0) agA , 

as described in Lemmas \^ and [3| lend infective intensity measurements. 

Proof. Note that G has n = KM vertices and is <i-regular with d = K\A\. We need to be 
robust to the removal of any M — 1 vertices, which in turn removes d(M — 1) edges, and so 
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it suffices to be robust to the removal of any dM edges. As such, we observe Lemma [T] and 
take £ to satisfy 

1 _ dM _ spg(C7) 

K~ dn~ - 6 ■ [6) 
We also want a connected component of size M after the removal of these edges, and so by 
Lemma [TJ it suffices to have 

M<(l- -*) n = (l - 2 )KM. (4) 

Rearranging ffify and Q produces the following specification on the spectral gap of G for 
sufficient connectivity: 

spg(G)>max{|, ] ^ T } = |, 

where the equality is valid provided K > 2. To meet this specification, based on Theorem [9j 
it suffices to have 1 — 6/y/c > Q/K, which requires K > 6 and is equivalent to having 
c > (1/6 — 1/K)~ 2 . Using Lemma [6j the total number of masks is 

K + 3 + | A | = K + 6 (* 2 +1 )|B|<A- + 9 (* + l ) clog M. 

Setting c = (1/6 — 1/K)~ 2 , then K = 12 minimizes the coefficient of logM. □ 

At this point, we note that 2 • 10 5 • logM is a rather large number of Fourier masks. 
Certainly, the 10 5 might be an artifact of our analysis — perhaps it could be decreased by 
leveraging better approximations. However, as the next result shows, the log factor is nec- 
essary for the Fourier masks to lend polarization-based recovery: 

Theorem 11. Take G as defined in Definition^ Then spg(C7) > e only if 

logM 



A > 



2 + log(l/e) - 



Proof. Define V to be the \A\ x M matrix built from taking the rows of F indexed by A 
and then scaling the columns to have unit norm. Then the inner product between any two 
columns of V is given by 



■> = £ 



aeA 



_ 2mma/M \ I _ -2mm' a/ M 



A J\J\A 



A\ M ^— ' " v ' \A\ 

As such, the worst-case coherence between columns of V can be expressed in terms of the 
Fourier bias of A (and the spectral gap of G by Lemma [5]): 

M 

max \(v m ,v m >)\ = — \\A\\ U = 1 - spg(G) < 1 - e. (5) 

m,m £%m Lrl 

As you might expect, unit vectors can only be incoherent if there are enough dimensions 
relative to the number of vectors. To establish this explicitly, view the V m S cLS vectors in 
M 2 I A I by stacking the real and imaginary parts of each entry. Letting 5 denote the shortest 
distance between distinct u m 's (this is the same in C' A ' and IR 2 ' A '), consider the open balls 
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of radius 5/2 centered at each v m . The disjoint union of these M balls is contained in the 
ball of radius 1 + 5/2 centered at the origin, and so a volume comparison gives 



5\2\A\ 



M.C(f) 2|A| =Vol( □ S( Vrai f))<VQl(B(O l l + i))=C(l + f) 
Rearranging this inequality then yields 

log M 

lAl ~ 21og(2/5 + l)- (6) 
To analyze 5, note that for any m, m! G Z^, we have 

\\v m ~ v m '\\ 2 = \\v m \\ 2 - 2Re(v m , v m >) + \\v m/ \\ 2 > 2 - 2\(v m , v m >) \ > 2e, 
where the last step is by dq). Thus 5 2 > 2e, with which we continue 

logM logM 

\A\ > - — > 



21og(v^7i+l) " 2 + log(l/e)' 
Here, the last inequality can be verified using the fact that e < 1. □ 

4. Numerical simulations 

In this section, we compare the polarization method of this paper to a state-of-the-art 
phase retrieval algorithm known as PhaseLift [9] . The main idea behind PhaseLift is that 
the intensity measurement x i— > \(x, ip)\ 2 can be viewed as a linear measurement if the signal 
x is "lifted" to the outer product xx* in the real vector space of self-adjoint M x M matrices. 
Indeed, the intensity measurement is a Hilbert-Schmidt inner product in this vector space: 

\{x, ip)\ 2 = ip*xx*(p = Tr[Lp*xx*(p] = Ti[LpLp* xx*] = (xx*, v?V 9 *)hs- 

However, this larger vector space has M 2 dimensions, and so M 2 inner products are necessary 
to identify any member of the space — that is, unless more information is available. In this 
case, we know that the desired self-adjoint matrix xx* is positive semidefinite with rank 1, 
and so one could seek to minimize rank over the positive semidefinite matrices with the given 
intensity measurements. Rank minimization tends to be a difficult program to solve, so one 
is inclined to relax it: 

(PhaseLift) minimize Tr(X) s.t. A(X) = 6, X y 0. 

To date, PhaseLift is known to stably reconstruct all signals in C M with only 0(M) 
complex Gaussian measurements [8]. However, very little is known about its performance 
when the measurement vectors do not come from a unitarily invariant distribution (e.g., in 
the Fourier masks setting). To be fair, the present paper also does not prove stable recon- 
struction from Fourier masks; indeed, the previous section "merely" established injectivity 
with efficient recoverability. Therefore, for the sake of completeness, this section will provide 
numerical simulations that compare the stability of both phase retrieval algorithms. 

First, we describe how we implement phase retrieval with polarization. Starting with an 
ensemble of vertex measurement vectors $y and edge measurement vectors $£, then for 
each edge G E, we apply ([!]) to get 

1 2 

o ^2 u > r (\(x,(p i +u) r (p j )\ 2 + e ijr ), 



r=0 
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where Eij r denotes the noise added to the r)th edge measurement. Later, we will nor- 
malize Wij in order to estimate the relative phase between the measurements at vertices % 
and j, but if Wij is small, this normalization will be particularly susceptible to noise. As 
such, we first remove vertices which are adjacent to small edge weights so as to promote 
reliability in the edges. 

Algorithm 1: Pruning for reliability 
Input: Weighted graph G = (V, E, W), pruning parameter a 
Output: Subgraph G' with a larger smallest edge weight 
Initialize G' G 
for i = 1 to - a)\V\\ do 

Find edge G E of minimal weight 

G'^G'\{i,j} 
end 



Now that we have isolated a subgraph with reliable edges, we want to find a further 
subgraph which is sufficiently connected so that its vertices can democratically agree on the 
relative phases. To find this subgraph, we iteratively remove vertices implicated by spectral 
clustering [31 HI \10\ until the spectral gap is sufficiently large. 

Algorithm 2: Pruning for connectivity 
Input: Graph G' = (V',E'), pruning parameter r 
Output: Subgraph G" with spg(G") > r 
Initialize G" <- G' 
while spg(G") < r do 

Take D to be the diagonal matrix of vertex degrees 
Compute the Laplacian L «— / — D^ 1 / 2 AD~ X I 2 

Compute the eigenvector u corresponding to the second eigenvalue of L 

for 2 = 1 to \V"\ do 

Let Si denote the vertices corresponding to the i smallest entries of D~ 1 / 2 u 
Let E(Si,Sf) denote the number of edges between Si and Sf 
hi 4- E(Si, Sf)f mm{J2 veSi deg(v), ^ veS c deg(v)} 

end 

Take S to be the Si of minimal hi 
G" ^G"\S 
end 



At this point, our graph has reliable edges and is well connected. We now run angular 
synchronization [HI HH] to reach a consensus on the phases of the vertex measurements, up 
to a global phase factor. 

Having estimated the phases of the vertex measurements corresponding to V", we now 
multiply the square roots of the vertex measurements {^\(x, <Pi)\ 2 + E^} ie v" by these phases 
to estimate the inner products {{x,ipi)}i e v"> an d then produce a least-squares estimate for 
the desired signal x. As established in pQ, this implementation of the polarization method is 
stable to noise when the vertex measurement vectors are complex Gaussian. In this section, 
we run numerical simulations to illustrate stability in the Fourier masks setting. 
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Algorithm 3: Angular synchronization 
Input: Graph G" = (V", E", W") 

Output: Vector of phases corresponding to vertex measurements 

Let A% denote the weighted adjacency matrix with the weights normalized 

Let D denote the diagonal matrix of vertex degrees 

Compute the connection Laplacian L\ ^— / — D~ l l 2 A\D~ X I 2 

Compute the eigenvector u corresponding to the smallest eigenvalue of L\ 

Output the phases of the coordinates of u 



In the experiments that follow, we run the above implementation of polarization using 
pruning parameters a = 0.99, r = 0.1, and the following construction of masks: First, we 
draw K = 3 masks whose diagonal entries are independent with distribution A/"(0, 1). Next, 
we draw M — 1 independent realizations of a random variable X with uniform distribution 
over the interval [0,1], and we put i G B whenever Aj < ° S M . This way, B C 7j M is a subset 
that does not contain 0, but the nonzero members of Zjh each reside in B with probability 



log — thereby following the intent of Theorem 10 Since ^ B, we take A = B U (— B), and 



M 



then we construct the auxiliary masks {D k + u r E a D k i} 2 =0 for every (k,k') G {0,1, 2}, k > k' 
and a G A. Overall, in these experiments, polarization will use a total of 18\A\ +3 masks, 
where \A\ tends to be approximately 21ogM. 

By contrast, PhaseLift offers a lot more flexibility with the number and types of masks 
used, and this flexibility allows us to run a collection of choice comparisons. We will run 
experiments at three noise levels, and for each level, we will compare polarization to three dif- 
ferent implementations of PhaseLift. In the first implementation, we will only give PhaseLift 
the measurements corresponding to the original 3 masks we give to polarization. Next, we 
will give PhaseLift all 18\A\ + 3 of the masks we described in the previous paragraph. For 
the last comparison, we will give PhaseLift the original 3 masks along with 18\A\ addi- 
tional masks whose diagonal entries are also independent with distribution J\f(0, 1). Based 
on discussions in [7j, this last setup appears to be the intended design of Fourier masks for 
PhaseLift, so in a sense, this last comparison will allow both algorithms to compete with 
their own "home-field advantage." 

In the comparison between polarization and PhaseLift, we use two performance metrics: 
run time and relative error of reconstruction, defined in this setting by 



\\cx — x\ 

min 



.fc 1Mb 

|c|=l 

Here, c is playing the role of the global phase factor we lost in the intensity measurement 
process. For each noise level tested a 2 G {0,0.1,1}, and for each type of mask ensemble 
given to PhaseLift (original 3, same 18\A\ + 3, random 18\A\ + 3), we considered multiple 
signal dimensions M G {2 5 , 2 6 , 2 7 , 2 8 , 2 9 }. In each of these scenarios, we ran 30 realizations 
of the following experiment: 

• Draw each entry of the signal x independently from Af(0, 1). 

• Add independent A/"(0, a 2 ) noise to each intensity measurement \(x,<p)\ 2 . 

• Run both polarization and PhaseLift to estimate x and record the run time and 
relative error. 
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We ran these experiments on an Intel® Core™2 Quad CPU Q9550 at 2.83GHz with 3.6 
GB of memory. 

Figure [l] presents the results of the noiseless scenarios (a 2 = 0). Here, perhaps the most 
striking distinction between polarization and PhaseLift is the average run time; indeed, po- 
larization produces an estimate in a matter of seconds, whereas PhaseLift typically takes 
around an hour to terminate in the higher-dimensional scenarios. Letting N denote the 
number of intensity measurements, then PhaseLift takes 0(N 3 ' 5 ) operations, whereas the 
computational bottleneck in polarization is the least-squares step, which takes 0(N 3 ) op- 
erations, but this does not account for the disparity in run time. Rather, the disparity is 
indicative of a large constant in front of the N 3 ' 5 . In comparing the relative error in re- 
construction, we note that PhaseLift terminates when the step size is sufficiently small, so 
this accounts for the significant difference in performance. Overall, in the noiseless case, 
polarization handily defeats PhaseLift, even when PhaseLift is given the random masks it 
prefers. 

Next, Figure [2] illustrates how polarization and PhaseLift perform in the present of some 
noise (a 2 = 0.1). In this case, PhaseLift continues to be rather slow, and it is clear that 
polarization enjoys greater stability from the 18\A\ additional masks when compared to 
PhaseLift's performance with just the original 3 random masks. In [7], it is claimed that 3 
random masks are sufficient for PhaseLift to reconstruct typical signals (and this is corrob- 
orated with our experiments in the noiseless case), but Figure [2] illustrates that PhaseLift 
requires more masks in order to reconstruct stably. And indeed, when PhaseLift is given more 
masks, it performs much better. Specifically, when they use the same masks, polarization 
and PhaseLift produce estimates with statistically comparable relative error, and PhaseLift 
performs slightly better when given the random masks it prefers. Intuitively, it makes sense 
that PhaseLift should be at least slightly more stable than polarization since it leverages 
all of the measurements in a democratic fashion, whereas polarization disenfranchises the 
vertex and edge measurements that are too small (i.e., "wasting" measurements). 

Figure [3] presents the results of our high-noise scenario (a 2 = 1), which are very similar to 
the results corresponding to o 2 = 0.1. 



5. Discussion 

In this paper, we showed how to leverage the polarization-based phase retrieval technique 
to construct G(logM) Fourier masks that uniquely determine every M-dimensional signal, 
and then we used numerical simulations to illustrate the stability of polarization with such 
masks. At this point, we offer two conclusions: 

(i) The polarization technique for phase retrieval is flexible enough to provably accom- 
modate certain measurement design criteria (such as Fourier masks). 

(ii) If you have the ability to use polarization instead of PhaseLift, you will gain consid- 
erable speedups in run time at the price of a slight increase in relative error. 

It remains to be rigorously proved that stability holds in the Fourier masks setting. We 
did not exploit the inherent Fourier structure to further speed up reconstruction, though 
this could very well be possible. The number of masks might be decreased if additional 
information about the signal were leveraged, as in [T2J [IB]. Finally, we still seek Fourier 
masks-based performance guarantees for PhaseLift and its modifications [TT] |2"T|. 
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Figure 1. Phase retrieval without noise. For the first row of graphs, 
PhaseLift is given the original K = 3 random masks, for the second, it is 
given all 18\A\ +3 of the masks given to polarization, and in the last row, 
PhaseLift is given 18\A\ + 3 random masks. The plotted data illustrate the 
sample mean plus/minus one sample standard deviation. In some cases, the 
lower error bar is negative, and so it is not plotted in the log scale. 
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Figure 2. Phase retrieval with jV(0, 0.1) noise added to each intensity mea- 
surement. For the first row of graphs, PhaseLift is given the original K = 3 
random masks, for the second, it is given all 18\A\ +3 of the masks given to 
polarization, and in the last row, PhaseLift is given 18\A\ +3 random masks. 
The plotted data illustrate the sample mean plus/minus one sample standard 
deviation. In some cases, the lower error bar is negative, and so it is not 
plotted in the log scale. 
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Figure 3. Phase retrieval with jV(0, 1) noise added to each intensity mea- 
surement. For the first row of graphs, PhaseLift is given the original K = 3 
random masks, for the second, it is given all 18\A\ +3 of the masks given to 
polarization, and in the last row, PhaseLift is given 18\A\ +3 random masks. 
The plotted data illustrate the sample mean plus/minus one sample standard 
deviation. In some cases, the lower error bar is negative, and so it is not 
plotted in the log scale. 
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